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A new quantile regression concept, based on a directional version of Koenker and Bassett’s 
traditional single-output one, has been introduced in [Ann. Statist. (2010) 38 635-669] for 
multiple-output location/linear regression problems. The polyhedral contours provided by the 
empirical counterpart of that concept, however, cannot adapt to unknown nonlinear and/or 
heteroskedastic dependencies. This paper therefore introduces local constant and local linear 
(actually, bilinear) versions of those contours, which both allow to asymptotically recover the 
conditional halfspace depth contours that completely characterize the response’s conditional 
distributions. Bahadur representation and asymptotic normality results are established. Illus¬ 
trations are provided both on simulated and real data. 
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1. Introduction 

1.1. Quantile/depth contours: From multivariate location to 
multiple-output regression 

A multiple-output extension of Koenker and Bassett’s celebrated concept of regression 
quantiles was recently proposed in Hallin, Paindaveine, and Siman [18] (hereafter HPS). 
That extension provides regions that are enjoying, at population level, a double inter¬ 
pretation in terms of quantile and halfspace depth regions. In the empirical case, those 
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regions are limited by polyhedral contours which can be computed via parametric linear 
programming techniques. 

Those results establish a strong and quite fruitful link between two seemingly unre¬ 
lated statistical worlds - on one hand the typically one-dimensional concept of quantiles, 
deeply rooted into the strong ordering features of the real line and Li optimality, with 
linear programming algorithms, and traditional central-limit asymptotics; the intrinsi¬ 
cally multivariate concept of depth on the other hand, with geometric characterizations, 
computationally intensive combinatorial algorithms, and nonstandard asymptotics. From 
their relation to depth, quantile hyperplanes and regions inherit a variety of geometric 
properties - connectedness, nestedness, convexity, affine-equivariance ... while, via its 
relation to quantiles, depth accedes to L± optimality, feasible linear programming algo¬ 
rithms, and tractable asymptotics. 

The HPS approach, however, is focused on the case of i.i.d. m-variate observations 
Yi,...,Y„, and the quantile/depth contours they propose provide a consistent recon¬ 
struction of the corresponding population contours in R m - call them unconditional or 
location contours. In the presence of covariates Xi,..., X n , with X, = (1, W')', the ob¬ 
jective of the statistical analysis is a study of the influence of the covariate(s) W on the 
response Y, that is, a study of the distribution of Y conditional on W. The contours 
of interest, thus, are the collection of the population conditional quantile/depth contours 
of Y, indexed by the values w £ R p_1 of W that is, the collection of location (p = 1) 
quantile/depth contours associated with the conditional (on W = w) distributions of Y. 

An apparently simple solution would consist in introducing the covariate values w 
into the linear equations that characterize (via the minimization of an Li criterion) the 
HPS contours. The resulting regions and contours, unfortunately, in general carry little 
information about conditional distributions, and rather produce some averaged (over the 
covariate space) quantile/depth contours - the only exception being the overly restrictive 
case of a linear regression relation between the response and the covariates, under which, 
for some b £ R p , the distribution of Y — (1, w')b conditional on W = w does not depend 
on w £ R p_1 . 

This problem is not specific to the multiple-output context and, in the traditional 
single-output setting, it has motivated weighted, local polynomial and nearest-neighbor 
versions of quantile regression, among others. We refer to [43-45] for conceptual insight 
and practical information, to [4, 9, 17, 19, 28, 47] for some recent asymptotic results, and 
to [2, 6, 14, 16, 22-24, 38] for some less recent ones. 

Our objective in this paper is to extend those local estimation ideas to the HPS concept 
of multiple-output regression quantiles. Since local constant and local linear methods 
have been shown to perform extremely well in the single-output single-regressor case (Yu 
and Jones [44]), we will concentrate on local constant and local bilinear approaches - in 
the multiple-output context, indeed, it turns out that the adequate extensions of locally 
linear procedures are of a bilinear nature. Just as in the single-output case, the local 
methods we propose in this paper do not require any a priori knowledge of any trend 
and - see [30] for details - asymptotically characterize the conditional distributions of Y 
given W = w for any w £ R 1 *” 1 . The final result is thus much more informative on the 
dependence of Y on the covariates than any standard linear or local polynomial mean 
regression. 
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It should be clear, however, that our methods, as well as other local nonparametric 
methods, do not escape the curse of dimensionality, and will run into problems in the 
presence of high-dimensional regressors. It follows indeed from the asymptotic results of 
Section 5 and, more particularly, from the rates in Theorem 5.2, that consistency rates 
are affected by p but not by m. 

Growth chart applications (with (p — 1) = 1) do not suffer this drawback, as only 
univariate kernels are involved. Growth charts (reference curves, percentile curves) have 
been used for a long time by practitioners in order to assess the impact of regressors on 
the quantiles of some given univariate variable of interest, and several methods have been 
developed (see, e.g., [3, 8, 40, 42], and the references therein), including single-response 
quantile regression (see [15, 41]). Much less results are available in the multiple-output 
case, with a recent proposal by Wei [39] , who defines a new concept of dynamic multiple- 
output regression contours generalizing single-output proposals by [4], [25] and [40]. These 
contours, however, do not have the nature and interpretation of (conditional) depth 
contours. They enjoy interesting conditional coverage probability properties (without 
any “minimal volume” or “maximal accuracy” features, though) but rely on a sequential 
conditioning of response components, and crucially depend on the order adopted for 
that conditioning. Their empirical versions are equivariant under marginal location-scale 
transformations of the response, but they are neither affine- nor rotation-equivariant. 
Our methodology, which is based on entirely different principles, appears as a natural 
alternative (see [32] for a real-data example of bivariate growth charts based on the 
methods we are describing here), yielding affine-equivariant regression contours with 
well-accepted conditional depth interpretation; moreover, we provide consistency and 
asymptotic distributional results. 

1.2. Motivating examples 

As a motivating example, we generated n = 999 points from the model 

(Yi,Y 2 ) = (W, W 2 ) + (l + | (sin (^) ) ) e, 

with W ~ C/([—2,2]) independent of the bivariate standard normal vector e. In Fig¬ 
ure 1, we are plotting the r = 0.2 and r = 0.4 HPS regression quantile/depth contours 
obtained by using the covariate vector X= (1 ,W)' (Figure 1(a)) and the covariate 
vector X = (1, W, W 2 )' (Figure 1(b)) in the equations of the quantile/depth hyper¬ 
planes of the (global) HPS mehod. More precisely, these figures provide the intersec¬ 
tions of the HPS contours with hyperplanes orthogonal to the u>-axis at fixed ic-values 
— 1.89, —1.83, —1.77,..., 1.89. 

Clearly, the results are very poor: Figure 1(a) neither reveals the parabolic trend, nor 
the periodic heteroskedasticity pattern in the data. Although it is obtained by fitting 
the “true” regression function, Figure 1(b), while doing much better with the trend, 
still fails to catch heteroskedasticity correctly. Instead of providing genuine conditional 
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Figure 1 . For n = 999 points following the model (Yi, Y 2 ) = (W, W 2 ) + (1 + §(sin(f W)) 2 )e, 
where W ~ {/([—2,2]) and e ~A/”(0, l) 2 are independent, the plots above show the intersections, 
with hyperplanes orthogonal to the w-axis at fixed w-values —1.89, —1.83, —1.77,... , 1.89, of (a) 
the HPS regression quantile regions with the single random regressor W, (b) the HPS regression 
quantile regions with random regressors W and W 2 , and (c)-(d) the proposed local constant 
and local bilinear regression quantile regions (in each case, r = 0.2 and r = 0.4 are considered). 
For the sake of comparison, the corresponding population (conditional) halfspace depth regions 
are provided in (e). The conditional scale function toi->l + |(sin^ui)) 2 is plotted in (f). Local 
methods use a Gaussian kernel and bandwidth value H = 0.37, and 360 equispaced directions 
u £ 5 1 were used to obtain results in (d). 
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Figure 2. For n = 499 points following the model (Yi,1 2 ) = (W , W 2 ) + (1 + |(sin(fVF)) 2 )e, 
where W ~ C/([—2,2]) and e ~ A/"(0, l) 2 are independent, the plots above show the intersections, 
with hyperplanes orthogonal to the w-axis at fixed w-values —1.89, —1.83, —1.77,... , 1.89, of (a) 
the HPS regression quantile regions with the single random regressor W, (b) the HPS regression 
quantile regions with random regressors W and W 2 , and (c)-(d) the proposed local constant 
and local bilinear regression quantile regions (in each case, r = 0.2 and r = 0.4 are considered). 
For the sake of comparison, the corresponding population (conditional) halfspace depth regions 
are provided in (e). The conditional scale function toi->l + |(sin^ui)) 2 is plotted in (f). Local 
methods use a Gaussian kernel and bandwidth value H = 0.37, and 360 equispaced directions 
u £ 5 1 were used to obtain results in (d). 
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quantile/depth contours, the “global” HPS methodology produces some averaged (over 
the w values) contours. 

In contrast, the contours obtained from the local constant and local bilinear meth¬ 
ods proposed in this paper - without exploiting any a priori knowledge of the actual 
regression function - exhibit a very good agreement with the population contours (see 
Figure l(c)-(e) to which we refer for details); both the parabolic trend and the periodic 
heteroskedascticity features now are picked up quite satisfactorily. Note that, compared 
to the local constant approach, the local bilinear one does better, as expected, close to 
the boundary of the regressor space (in particular, the local constant approach is missing 
the decay of the conditional scale when w converges to —2). 

Similar comments remain valid for smaller sample sizes; see Figure 3, based on a sample 
of n = 499 data points. 

A second example is contrasting a homoskedastic setup and a heteroskedastic one. 
More specifically, we generated n = 999 points from the homoskedastic model (kj, Y?) = 
{W.W 2 ) + e and from the heteroskedastic one (kj, Y 2 ) = (W,W 2 ) + (1 + W 2 )e , where 
W ~ U ([—2,2]) and e ~ jV(0,1/4) 2 are mutually independent. As above, the intersections 
of the resulting contours with hyperplanes orthogonal to the ic-axis at fixed w- values are 
provided. Figure 3 shows those intersections for the local constant and local bilinear 
quantile contours associated with w £ {—1.89,—1.83,—1.77,..., 1.89}, for r = 0.2 and 
t = 0.4. As in the previous example, those sample contours approximate their population 
counterparts (shown in Figure 3(e) and (f)) remarkably well. In particular, the inner 
regions mimic the trend faithfully even for quite extreme regressor values. Again, the local 
bilinear method seems to provide a much better boundary behavior than its local constant 
counterpart; in the heteroskedastic case, the latter indeed severely underestimates the 
conditional scale for extreme values of W. 

1.3. Relation to the depth and multivariate quantile literature 

As already explained, this work is lying at the intersection of two distinct, if not unrelated, 
strands of the statistical literature - namely (i) statistical depth and (ii) multivariate 
quantiles. Under both strands, definitions have been proposed for unconditional concepts, 
that is, for statistical models that do not involve covariates. When covariates are present, 
the focus is shifted from unconditional features to conditional ones. The main objective, 
indeed, now is the analysis of the dependence of a response Y on a set of covariates 
X, that is, a study of the distributions of Y conditional on the values x of X in its 
broadest sense, the regression problem - and various attempts have been made to propose 
regression versions of (unconditional) depth or quantile concepts, respectively. 

Now, if a study of the dependence on x of the distributions of Y conditional on 
X = x is the main objective, conditional depth and conditional (multivariate) quantiles, 
associated with the distributions of Y conditional on X = x, are or should be the concepts 
of interest. Not all definitions of regression depth or (multiple-output) regression quantiles 
are meeting that requirement, though. Nor do they all preserve, conditionally on X = x, 
the distinctive properties of a depth/quantile concept. In contrast with this, the concept 
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Figure 3. Local multiple-output quantile regression with Gaussian kernel and ad-hoc band¬ 
width H = 0.37: cuts through w € {-1.89,-1.83, —1.77,... , 1.89} for r = 0.2 and r = 0.4 corre¬ 
sponding to n = 999 random points drawn from a homoskedastic model (Yi, I 2 ) = {W, W 2 ) + e 
((a), (c)) or a heteroskedastic model (YijYb) = (W, W 2 ) + (1 + W 2 )e ((b), (d)), where 
W ~ [/([—2,2]) and e ~ A/"(0,1/4) 2 are independent. The plots are showing the intersections, 
with hyperplanes orthogonal to the w-axis at fixed w-values, of the contours obtained either 
from the local constant method ((a), (b)) or the local bilinear one ((c), (d)). Color scaling of the 
points (resp., the intersections) mimics their regressor values, whose higher values are indicated 
by lighter red (resp., lighter green). For the sake of comparison, the population (conditional) 
halfspace depth regions are provided in (e) and (f). A color version of this figure is more readable, 
and can be found in the on-line edition of the paper. 
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we are proposing in this paper, being the conditional version of the unconditional HPS 
concept, enjoys all the properties that are expected from a conditional depth/quantile 
concept, while fully characterizing the conditional distributions of Y. 

1.3.1. Regression depth 

An excellent summary of depth-related problems is provided in Serfling [37] , which further 
clarifies the nature of depth by placing it in the broader perspective of the so-called 
DOQR paradigm , relating Depth to the companion concepts of Outlyingness, Quantiles, 
and Ranks. To the best of our knowledge, this paradigm never has been considered in a 
conditional (regression) context, but it seems quite desirable that any regression depth 
concept should similarly be placed, conditionally, in the same DOQR perspective. 

The celebrated regression depth concept by Rousseeuw and Hubert [35], for instance, 
does not bear any direct relation to conditional depth and the DOQR paradigm. Rather 
than the depth of a point in the observation space, that concept aims at defining, via 
non-fits and breakdown values, the depth of a (single-output) regression hyperplane. A 
multiple-output version is considered in Bern and Eppstein [1], Similarly, an elegant 
general theory has been developed by Mizera [33] who, in the context of a general para¬ 
metric model, defines the depth of a parameter value. Again, the approach and, despite 
the terminology, the concept, is of a different nature, unrelated to any conditional depth. 
Extensions to a nonparametric regression setting, moreover, seem problematic. 

Kong and Mizera [29] propose an approach to unconditional depth, based on projection 
quantiles, which provides an approximation to the unconditional halfspace depth contours 
- see [18] and [29]. Although an application to bivariate growth charts is briefly described, 
in which a local smoothing, based on regression spline techniques, of their unconditional 
concept is performed (little details are provided), the regression setting is only briefly 
touched there. In particular, no asymptotic analysis of the type we are providing in 
Section 5 is made available. 

1.3.2. Multivariate regression quantiles 

Turning to conditional multivariate or multiple-output regression quantile issues, much 
work has been devoted to the notion of spatial regression quantiles; see, essentially, 
Chakraborty [5] for linear and Cheng and De Gooijer [7] for nonparametric regression. De¬ 
spite a strong depth flavor, those spatial quantiles and spatial regression quantiles, how¬ 
ever, intrinsically fail to be afBne-equivariant; Chakraborty [5] defines affine-equivariant 
spatial quantiles for linear regression via a transformation-retransfornration device, but, 
to the best of our knowledge, there exists no affine-equivariant version of spatial quantiles 
for general nonparametric regression. 

For the sake of completeness, one also should mention here the closely related literature 
on growth charts described at the end of Section 1.1, which, besides a lack of affine- 
invariance, essentially fails, in the multiple-output case, to address the conditional nature 
of the regression quantile concept it is dealing with. 
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The rest of this paper is organized as follows. Section 2 defines the (population) condi¬ 
tional regression quantile/depth regions and contours we would like to estimate in the 
sequel. This estimation will make use of (empirical) weighted multiple-output regression 
quantiles, which we introduce in Section 3. Section 4 explains how these weighted quan¬ 
tiles lead to local constant (Section 4.2) and local bilinear (Section 4.3) depth contours. 
Section 5 provides asymptotic results (Bahadur representation and asymptotic normal¬ 
ity) both for the local constant and local bilinear cases. Section 6 deals with the practical 
problem of bandwidth selection. In Section 7, the usefulness and applicability of the 
proposed methods are illustrated on real data. Finally, the Appendix collects proofs of 
asymptotic results. 


2. Conditional multiple-output quantile/depth 
contours 


Denote by (X', Y')' = {Xu,.. .,X ip , Yu,... ,Y irn )', i = l,...,n, an observed n-tuple of 
independent copies of (X', Y')', where Y := {Y\,... ,Y m )' is an m-dimensional response 
and X := (1, W')' a p-dimensional random vector of covariates. For any r £ (0,1) and 
any direction u in the unit sphere S m ~ 1 of the m-dimensional space of the response 
Y, the HPS concept produces a hyperplane 7iy u (iTtu in the empirical case) which is 
defined as the classical Koenker and Bassett regression quantile hyperplane of order r 
once (Op-i.u')' has been chosen as the “vertical direction” in the computation of the 
relevant Li deviations. 

More specifically, decompose y G R m into (u'y)u + T u (r( 1 y), where T u is such that 
(u, T u ) is an m x m orthogonal matrix; then the directional quantile hyperplanes 7r TU 

(n) 

and 7r)-u are the hyperplanes with equations 


U V — c^r^y — a(_(l, w'/ = 0 and u'y - c^. n) T„y - a(. n) '(l, w')' = 0 
(w G M p_1 ) minimizing, with respect to c G M m_1 and a G R p , 

n 

Y[pr (u'y - cT' u Y - a'X)] and ]T p T { u'Y t - cT' u Y 4 - a'X,), 

i—1 

respectively, where £ p T (()i with 

Pt(0 : = C( T - 4[C < 0]) = max{(r - !)(,<} = (|CI + (2r - 1)0/2, C G ® 


( 2 . 1 ) 


( 2 . 2 ) 


(2.3) 


as usual denotes the well-known r-quantile check function. HPS moreover show that 7r TU 
and 7 Tru can equivalently be defined, in a more symmetric way, as the hyperplanes with 
equations 

b;y-a;(l,w')' = 0 and b^'y - w')' = 0, 


(2.4) 
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minimizing, with respect to b £ R m satisfying b'u = 1 and a £ R p , the Li criteria 


E[p T (b'Y-a'X)] and ^ p T (b'Y* - a'X;), (2.5) 

i=l 


respectively. 

For p = 1, the multiple-output regression model reduces to a multivariate location one: 
a T and at"- 1 reduce to scalars, a T and ai"\ while the equations describing 7r TU and tTtu 
take the simpler forms 

u'y - c;r' u y - a T = 0 and u'y - c^T^y - a ^ = 0, (2.6) 

respectively. Those location quantile hyperplanes 7r ru and are studied in detail in 
HPS, where it is shown that their fixed-r collections characterize regions and contours 
that actually coincide with the Tukey halfspace depth ones. Consistency, asymptotic 
normality and Bahadur-type representation results for the ’s are also provided there, 
together with a linear programming method for their computation. 

The objective here is an analysis of the distribution of Y conditional on W, that is, 
of the dependence of Y on W in strong contrast with traditional regression, where 
investigation is limited to the mean of Y conditional on W. The relevant quantile hy¬ 
perplanes, depth regions and contours of interest are the location quantile/depth hy¬ 
perplanes/regions/contours associated (in the sense of HPS) with the m-dimensional 
distributions of Y conditional on W - more precisely, with the distributions p Y l w=w o 
of Y conditional on W = wo (wo £ R p_1 ). We now carefully define these objects - call 
them wq -conditional t- quantile or depth hyperplanes, regions and contours. 

Let r £ (0,1) and u £ S m ~ l := {u £ R m : ||u|| = 1} (the unit sphere in R m ), and write 
t := ru. Denoting by wp some fixed point of R p_1 at which the marginal density / w of 
W does not vanish (in order for the distribution of Y conditional on W = wo to make 
sense), define the extended and restricted wo- conditional r -quantile hyperplanes ofY as 
the (to + p — 2)-dimensional and (to — l)-dimensional hyperplanes 

TT-riwo := {(w', y')' £ R p_1 x R m | b(_. Wo y - a T;w „ = 0} (2.7) 

and 

7Tx;w 0 := (K, y')' £ RP- 1 x R m I b(_ ;WQ y - a T;Wo = 0}, (2.8) 

respectively, where a.,- ;Wo and bT- ;Wo minimize 

'k r ;w 0 (a, b) := E[p r (b , Y — a) \ W = w 0 ] subject to b'u = 1, (2.9) 

with the check function p T defined in (2.3). Comparing (2.9) with (2.5) immediately 
shows that 7 Tt- ;Wo is the (to — l)-dimensional (location) r-quantile hyperplane of Y as¬ 
sociated with the distribution of Y conditional on W = wo- Of course, 7 r r:Wo is also the 
intersection of 7r T;Wo with the m-dimensional hyperplane C Wo := {(wp,y')' | y £ R m }. 
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This, and the fact that n T:Wo is “parallel to the space of covariates” (in the sense that if 
( w o,yo)' £ 7r t ;W o , then (w',yg)' £ 7iv ;Wo for all w), fully characterizes 7r T:v/0 . 

Associated with 7iv :Wo are the extended upper and lower wp -conditional t- quantile 
halfspaces 


H+ Wo := {(w',y')' £ R p_1 x | b; ;Wo y - a T;Wo > 0} 

and 

H" Wo := {(w',y')' £ R p_1 x | b!/ :Wo y - a x;wo < 0}, 
with the extended (cylindrical) wo -conditional quantile/depth regions 

Rwo (r):= f| {H+ u;Wo } (2.10) 

ues ™- 1 


and their boundaries <9R Wq (t), the extended wq- conditional quantile/depth contours. The 
intersections of those extended regions R Wo (r) (resp., contours <9R Wo (r)) with C Wo are 
the restricted wo-conditional quantile/depth regions i? Wo (r) (resp., contours 91? Wo (r)), 
that is, the location HPS regions (resp., contours) for Y, conditional on W = wo. It 
follows from HPS that those regions are compact, convex, and nested. As a consequence, 
the regions R Wo (r) also are closed, convex, and nested. 

Finally, define the nonparametric t- quantile/depth regions as 

R ( r ) : = U R w 0 (t)= |J (Rw 0 (r)nC Wo ) 

wqGK p_ 1 woGl p_1 


and write <9R(r) for their boundaries. The regions R(r) are still closed and nested but 
they adapt to the general dependence of Y on W: in particular, 9R(r), for any r, 
goes through all corresponding 9i? Wo (r)’s, w 0 £R P_1 . Consequently, the regions R(t) 
in general are no longer convex. 

The fixed-wo collection (over r £ (0,1/2)) of all w 0 -conditional location quantile/depth 
contours 9R Wo (r) (which, by construction, are the intersections of <9R(r) with the “ver¬ 
tical hyperplanes” C Wo ) will be called a wp- quantile/depth cut or wp -cut. Such cuts are 
of crucial interest, since they characterize the distribution of Y conditional on W = wo, 
hence provide a full description of the dependence of the response Y on the regressors 
W. Note that the nonparametric contours 9R(r), via the location depth interpretation, 
for fixed wo, of the 9R Wo (r)’s, inherit a most interesting interpretation as “regression 
depth contours”. Clearly, this concept of regression depth, that defines regression depth 
of any point (w',y')' £ K m+P_1 , is not of the same nature as the regression depth con¬ 
cept proposed in [35], that defines the depth of any regression “fit” (i.e., of any regression 
hyperplane). 
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3. Weighted multiple-output empirical quantile 
regression 


Under the assumption of absolute continuity, the number of observations, in a sample of 
size n, belonging to C Wo clearly is (a.s.) zero, which implies that no empirical version of 
the conditional regression hyperplanes (2.7) or (2.8) can be constructed. If nonparametric 
r-quantile/depth regions or contours, or simply some selected cuts, are to be estimated, 
local smoothing techniques have to be considered. Those techniques typically involve 
weighted versions, with sequences Wwj = (w^,;, i = 1,..., n) of weights, of the empirical 
quantile regression hyperplanes developed in HPS. In this section, we provide general 
definitions and basic results for such weighted concepts, under fixed sample size n and 
weights ujp, see [21] for another approach combining weights with halfspace depth. In 
Section 4, we will consider the data-driven weights to be used in the local approach. 

Consider a sample of size n, with observations (X', Y')' = ((1, W'), Y')', i = 1,..., n, 
along with n nonnegative weights u>i satisfying (without any loss of generality) X^=i Wi = 
n ( u>i = 1 then yields the unweighted case). The definitions of HPS extend, mutatis 
mutandis, quite straightforwardly, into the following weighted versions. The coefficients 
a-r-u £ R p and b T / £ R m of the weighted empirical r-quantile hyperplane 


:= {(w', y')' £ M”" 1 x | b^y - a^'(l, w')' = 0} 


(3.1) 


(an (m + p — 2)-dimensional hyperplane) are defined as the minimizers of 


1 

4'(."2(a,b) := — Wip T (b'Y ! , — a'Xd subject to b'u = 1. 
’ n 

i=l 


(3.2) 


As usual in the empirical case, the solution may not be unique, but the minimizers 
always form a convex set. When substituted for the it-r-.wo ’s in the definitions of upper 
and lower conditional r-quantile halfspaces, those also characterize upper and 

lower weighted x-quantile halfspaces and , with weighted r-quantile/depth 

regions and contours 


Rl”V):= D and OR £">(r), 

ues ™- 1 


respectively. Note that the objective function in (3.2) rewrites as 

1 " 

*£"2(a,b) = -y fr (b'Y, ;w -a'X, ;u ), 

’ n ' 

*=l 

with Xi ;w :=cc,;Xi and Y, ;w :=oJjYj. As an important consequence, the weighted quan¬ 
tile/depth hyperplanes, contours and regions can be computed in the same way as their 
non-weighted counterparts because the corresponding algorithm in [34] allows to have 
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(Xj)i ^ 1. Due to quantile crossing, however, and contrary to the population regions and 
contours defined in the previous section, the rL"^(t)’s need not be nested for p > 2; if 
nestedness is required, one may rather consider the regions R^n(r) := 

The necessary sample subgradient conditions for (ai"^, can be derived as in the 

unweighted case. They state in particular that 


1 

n 


E W ‘ J [ bi7i , Y.-a^i / X i <0]<r< 

i= 1 




^I[ b^T-a^X^O], 


i=l 


(n) — 

which controls the probability contents of HV ; i with respect to the distribution putting 
probability mass uii/n on (W',Y'y, i = l,...,n. The width of this interval depends 
only on the weights uji associated with those data points (W', Y')' that belong to 7ri”i- 
Another consequence worth mentioning is that there always exists a tvtu-u hyperplane 
containing at least (m +p — 1) data points of the form (Wj, Y*). With probability one, 
thus, the intersection defining the regions rL" (r) is finite. 

Note that, unlike the extended conditional quantile hyperplanes (2.7), the weighted 
empirical quantile hyperplanes (3.1) involve an unrestricted coefficient a £ R p . As a con¬ 
sequence, 7ri”i is not necessarily parallel to the space of covariates (as defined in page 
11). That degree of freedom will be exploited in the local linear approach described in 
Section 4.3 (in an augmented regressor space, though, which makes it bilinear rather 
than linear). If we impose the additional constraint a = (ai, 0,..., 0)' in (3.1) and (3.2), 
we obtain hyperplanes of the form 

■■= {(w',y')' 6 R p_1 x | bW'y - flW ;w = 0}. (3.3) 

The corresponding minimization problem yields hyperplanes that are parallel to the space 
of covariates, hence “horizontal” cylindrical regions and contours, to be considered in the 
local constant approach of Section 4.2. 

Finally, it should be pointed out that (y and/or w)-affine-invariant weights u>i := 
w ( w i,yi) yield weighted quantile/deptli hyperplanes, regions, and contours with good (y 
and/or w)-affine-equivariance properties. 


4. Local quantile/depth regression 

4.1. From weighted to local quantile/depth regression 

The weighted quantiles of Section 3 have an interest on their own. They can be used 
for handling multiple identical observations (allowing, for instance, for bootstrap proce¬ 
dures), or for downweighting observations that are suspected to be outliers or leverage 
points. Above all, weighted regression quantiles allow for a nonparametric approach to 
regression quantiles that will take care of the drawbacks of the unweighted approach 
of HPS (see the example considered in Section 1.2). In particular, adequate sequences 
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of weights will allow to estimate the conditional contours described in Section 2, thus 
extending to the multiple-output case the local constant and local linear approaches to 
quantile regression proposed, for example, by [43, 44] in the single-output context. 

The basic idea is very standard: in order to estimate Wo-conditional quantile/depth 
hyperplanes, regions or contours, we will consider weighted quantile/depth hyperplanes, 
regions or contours, with sequences of weights := wt'j (W/) based on weight functions 
of the form 

wnwW(w) ^^(^(w-wq)), (4.1) 

where h n is a sequence of positive bandwidths and K a nonnegative kernel function over 
R p_1 . The literature proposes a variety of possible kernels, and there is no compelling rea¬ 
son for not considering the most usual, such as the rectangular (uniform), Epanechnikov 
or (spherical) Gaussian ones. 

Since we typically intend, for any fixed r G (0,1), to compute by means of parametric 
programming the directional quantile hyperplanes for all u G S m ~ x , we should use the 
same weights for all of them. This is why we only consider u-independent bandwidths. 
However, exact computation of all quantiles (for each fixed r) is possible in the local 
constant case, but not in the local bilinear one. In the latter case, depth contours will be 
approximated by sampling the unit sphere (in Figures 1 and 2, for instance, 360 directions 
were sampled uniformly over the unit circle), which of course would allow u-dependent 
bandwidths if desired. 

4.2. Local constant quantile/depth contours 

The above weighting scheme can be applied in the computation of the weighted cylindrical 
regions generated by the hyperplanes in (3.3); more precisely, these cylindrical regions, 
with edges parallel to the space of covariates, are obtained by computing the intersection 
(over all u’s, for fixed r) of the upper quantile halfspaces associated with the quantile 
hyperplanes in (3.3); see Figure 4(a). 

The intersection with the w = wp hyperplane of these cylindrical regions yields 
a local constant estimate, dR^o const (r) say, of the corresponding population Wo-cut 
9i? w0 ( T ); see Section 5 for asymptotic results. Of course, the resulting local constant 
r-quantile/depth contours, namely 

9R(”) const (r) := |J dR^ o const (r), 

wogRp - 1 

are not (globally) cylindrical, but rather adapt to the underlying possibly nonlinear 
and/or heteroskedastic dependence structures. 

This approach, which constitutes a generalization of the local constant approach 
adopted elsewhere for single-output regression, has many advantages. The main one is 
parsimony: each quantile hyperplane involved in the construction only entails m param¬ 
eters, which is strictly less than in the local bilinear approach of the next section. On the 
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Figure 4. Construction of (a) the local constant and (b) the local bilinear r-quantile regions 
as described in Sections 4.2 and 4.3. 


other hand, the local constant approach does not provide any information on, nor does 
take any advantage of, the behavior of w-cuts for w values in the neighborhood of wo, 
and its boundary performances are likely to be poor. These two reasons, in traditional 
contexts, have motivated the development of local linear and local polynomial methods; 
see [10] for a classical reference. Local linear methods were successfully used in single¬ 
output quantile regression ([43-45, 47]). Considering them in the present context, thus, 
is a quite natural idea. 


4.3. Local bilinear quantile/depth contours 


Assume that the distribution of (W', Y')' is smooth enough that the coefficients of w- 
conditional quantile hyperplanes are differentiable with respect to w. Getting back to 
the first characterization (2.1) and (2.2) of quantile hyperplanes, the (restricted) wo- 
conditional x-quantile hyperplane of Y defined in (2.8) and (2.9) has equation (in y 
of course, in w, we just have w = Wq) 


a y (utj Wq 



(4.2) 


The same hyperplane equation, relative to a point w in the neighborhood of wo, takes 
the form 


a y (u-7-;wq 



- (w-w 0 )'(a T:Wo ,c;. Wo ) 



o(||w- w 0 ||) = 0, 


(4.3) 


where a t- :Wo stands for the gradient of we> a T:w and c x:w for the Jacobian matrix of 
w i-4 Ct- ; w , respectively, both taken at w = Wq. In order to express this equation into the 
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equivalent quantile formulation in (2.4) and (2.5), note that we have b.,- ;Wo = u — r u Ci- ;Wo , 
which entails t>T- ;Wo = — r u CT- ;Wo , where b.,- ;Wo is the Jacobian matrix of w4 b T;w at 
w = Wq. Neglecting the o(||w — wq||) term, (4.3) then rewrites, after some algebra, as 


(h' — Wnb(_ 

v t;wq U t; Wo 


)y 


(4.4) 


(®t;w 0 W 0®t;w O i® T ;woI (veCC.,- ;Wo ) ) | W 

w <8> (r(,y) 


= 0 . 


Letting x := (1, w')' := (l,w', (w ® T^y)')', the latter equation is of the form (3' T y — 
a' T (l,w'Y = 0, with (3' t u = (b(_. Wo - w(,b(.. Wo )u = b;. Wo u = 1 since b(- ;Wo u = 
—c(.. Wo r' u u = 0. Comparing with (2.4), this suggests a local linear approach based on 
weighted quantile hyperplanes (in the mp-dimensional regressor-response space associ¬ 
ated with the augmented regressor x, that is, the (w', y')'-space), yielding weighted 
empirical quantile hyperplanes with equations 


(”)'v - oS n) ' 


/3V;i'y 


;(i, 


= 0 , 


(4.5) 


based on the same sequences of weights := (Wj), i = 1,..., n, as in Section 4.1. 

Interpretation of the results, however, is easier from (4.3) than from (4.4). The left-hand 
side of (4.3) indeed splits naturally into two parts of independent interest: (i) the first 
one, made of the first two terms, yields the equation of the Wo-conditional T-quantilc 
hyperplane of Y, hence provides the required information for constructing the empirical 
wo-cuts, whereas (ii) the second part (the third term) provides the linear (linear with 
respect to (w — wo); actually, bilinear in (w — Wo) and T^y) correction required for a 
small perturbation (w — wp) of the value of the conditioning variable. Therefore, the 
important quantities to be recovered from and are estimations of these two 
parts, which are easily obtained by 

(i) letting w = Wo in (4.5), which yields the equation 

~ «i^(l> w o, (w 0 ® I^y)')' = 0 

of an empirical hyperplane providing an estimate of the two first terms in (4.3), 
namely, the wo-conditional x-quantile hyperplane; 

(ii) subtracting the latter equation from (4.5), which provides the bilinear correction 
term. 

The bilinear nature of the local approximation in (ii) is easily explained by the fact that, in 
general, unless the Wo-conditional and w-conditional r-quantile hyperplanes are parallel 
to each other, no higher-dimensional hyperplane can run through both (for instance, two 
mutually skew non-intersecting straight lines in R 3 do not span a plane). Omitting the 
additional W <g> (r^Y) regressors (in (i) above) may result in inconsistent estimators 
of the wo-conditional r-quantile hyperplanes. The resulting regions in R m+P_1 , are not 
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polyhedral anymore, but delimited by ruled quadrics (hyperbolic paraboloids for m = 2 
and p — 1 = 1), the intersections of which with the w = wo hyperplane yield polyhedral 
estimated w 0 -cuts; see Figure 4(b). 

The local bilinear approach is more informative than the local constant one, and should 
be more reliable at boundary points; the price to be paid is an increase of the covariate 
space dimension (due to the presence of the regressors W and W (g> (r^Y) in (4.5)), hence 
of the number of free parameters ( mp instead of m for the local constant method). Note 
however that the smoothing features of the problem, namely the dimension of kernels, 
remains unaffected (p — 1, irrespective of m). 

5. Asymptotics 

Throughout this section, we fix wo and r = ru, hence also a,T- ;wo and c x;wo , and write, 
for simplicity, Y u := u'Y and Y^ := T^Y. Asymptotic results require some regularity 
assumptions on the density /, the kernel K , and the bandwidth h n . 

Assumption (Al). 

(i) The n-tuple (W',Y')' ; t = l,... ,n is an i.i.d. sample from (W',Y . 

(ii) The density w H>• / w (w) of W is continuous and strictly positive at Wo- 

(iii) For any t £ R m_1 , there exist a neighborhood Bt of a T)VJo + c(_. W[i t and a 

neighborhood B t (wo) of wp such that s H > / iu l Y u= t ’ W=w (s) continuous over 

s £ Bt, uniformly in w £ B t (wo), and w i-> u ^ Yu =t,w=w (s) is continuous over 
w G B t (w 0 ) for all s £ B t . 

(iv) The density / Y " l w=w (t) of Y„ conditional on W = w is continuous with respect 
to w over a neighborhood of wo, except perhaps for a set of t values of / Y “ - 
measure zero. 

(v) The m x m matrix 



is finite and positive definite. 


Assumption (A2). The kernel function K 

(i) is a compactly supported bounded probability density over such that 

(ii) f KP _ i wAf(w) dw = 0 and ptf := f RP -i ww'Ar(w) dw is positive definite. 


Assumption (A3). The bandwidth h n is such that limn^oo h n = 0 and limn^oo nh v n 1 = 
00 . 

The conditions we are imposing in Assumption (Al) are quite mild. For example, 
Assumption (Al) (ii) is the same as Condition (A)(iii) in [11] and Assumption (Al)(i) in 
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[17]; Assumption (Al)(iii)-(v) are similar to Condition (A)(i, iv) in [11] and Condition 
(Al)(ii) in [17], where the existence and positive-definiteness ensure the invertibility of 
G x;Wo in Theorem 5.1. 

Assumptions (A2) and (A3) on the kernel function and the bandwidth also are quite 
standard in the nonparametric literature. For example, any compactly supported sym¬ 
metric density function satisfies Assumption (A2). The compact support of K in Assump¬ 
tion (A2) is only a technical assumption to simplify the proof of theorems. In practice, 
Gaussian kernels can be considered; indeed, at the cost of more involved proof, the com¬ 
pact support assumption in Theorems 5.1 and 5.2 can be replaced with the assumption 
that both Cq := f RP -i K 2 (w) dw and := / RP _i ww'AT 2 (w) dw are finite. As for As¬ 
sumption (A3), it is the usual one in the i.i.d. setting; see Section 6 for a discussion. 

Let A’O := (1,Y„')' and X „ := (1,Y„')' <g> (1, (W — wo)')', where the superscript c 
and £ stand for the local constant and local bilinear cases, respectively. For (W, Y) = 
(Wi.YO, we use the notation Yi U , Y^, X^ u , X f iu , etc. in an obvious way. 

Referring to (4.2) for the notation, the parameter of interest for the local constant case 
is 6 C = 0(_ ;Wo := (a T ;woj C T;w 0 ) , > whereas, in the local bilinear case (see (4.3)), we rather 
have to estimate 


6 = 6 T , W 0 ;=VeC 


a -r; w 0 C r;w 0 
a-T;w 0 C t;w 0 


(5.1) 


The local constant and local bilinear methods described in the previous sections provide 
estimators of the form Q ( ^ := (a, c')' and 


t) := vec 



(5.2) 


(we should actually discriminate between (a, c') = (a c , c c ') and (a, c') = (a e ,c e '), but will 
not do so in order to avoid making the notation too heavy); those estimators are defined 
as the corresponding minimizer 6 ' of 

n 

J2K h (W i -w 0 )p T (Y iu -e r, X r i J, r = c,L (5.3) 

i=l 


~ c(n) ^ £(n) 

The following result provides Bahadur representations for 6 and 0 


Theorem 5.1 (Bahadur representations). Let Assumptions (Al), (A2)(i) and (A3) 
hold, assume that w i-> (aT- ;w ,c(.. w )' is continuously differentiable at wo, and write 
ip T (y) :=r — I[y < 0]. Then, as n —> oo, 

nh p n - 1 M r h (e r{n) - e r ) 

= k( W '~ Wo ) MZI u (d))( M ^- 1 ATL + op(l), 

V n h n j = i V n / 


(5.4) 
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with G t;wo defined in Assumption (Al)(v) (the result for the local constant case does 
not require (A2)(ii)j. 

This result, along with Assumption (A4) below, entails the asymptotic normality of 

*r(n) 

0 , r = c,£. That assumption deals with the existence, at w = wq, of the second 

derivatives ofwi-> (a x;w , c^. w )'. With Ci- ;w =: (ct- ;w .i, ..., c T;W)Tn _i) , ) denote by ai- ;w and 
Ct- ; w ,j the {p— 1) x 1 vectors of first derivatives and by & T:W and c T;w j the (p— 1) x (jp— 1) 
matrices of second derivatives (when they exist) ofwH> a.,- ;w and w i—> c T]VJ j, respectively 
(recall that a x:w and c T VJ = (c T;W) i, ..., were already defined in page 15). 

Finally, write c(_. w for the (p — 1) x (m — l)(p — 1) matrix (c x;W) i,..., Ct- ;W;? 71 _i). 

Assumption (A4)- 

(i) The function w i-> (a x;w , c(.. w )' is twice continuously differentiable at w = wp, 
that is, aT- :w and c T;w exist in a neighborhood of wp and are continuous with 
respect to w at wp. 

(ii) The function w4/ w (w) is continuously differentiable at w = wp, that is, the 
(p — 1) x 1 vector of first derivatives of / w , / w (w), exists in a neighborhood of 
Wo and is continuous with respect to w at Wo. 

The following matrices are involved in the asymptotic bias and variance expressions of 
the asymptotic normality result in Theorem 5.2 below. Define 



(5.5) 


: =t(1-t)/ w (w)t^.. 


(5.6) 



and, for r = c,£, 



(5.7) 


x 



® B 


'w;0 



dt 


where (putting Ct- :W! q := a T;w ) B ^,. 0 is the lxm matrix with jth entry 
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and B^. 0 denotes the p x m matrix with (i,j) th entry 


— tr 


c x;w ,j-i / Wi-i'ww 1 K(w) dw 
Jrp- 1 


i = 1,... ,p, j = 1, 


here, we wrote w = (w±,W 2 , • • •, w P - 1 )', wq = 1. We then have: 


Theorem 5.2 (Asymptotic normality). Let Assumptions (A1)-(A4) hold. Then, for 
r = c,£, 

4^(0,s; 0 ), (5.8) 

c, 

as n —>• oo, where —> denotes convergence in distribution (the result for the local bilinear 
case does not require Assumption (A4)(ii) / ). 

Remark 5.1. The local bilinear fitting has an expression of bias that is independent 
of / w . In contrast, the local constant fitting has a large bias at the regions where the 
derivative of / w is large, that is, it does not adapt to highly-skewed designs (see [10, 12]). 
Another important advantage of local bilinear fitting over the local constant approach 
is its much better boundary behavior. This advantage often has been emphasized in 
the usual regression settings when the regressors take values on a compact subset of 
For example, considering a univariate random regressor W (p = 2) with bounded 
support ([0,1], say), it can be proved, using an argument similar to the one developed in 
the corresponding proof in [10], that asymptotic normality (with the same rate) still holds 
at boundary points of the form ch n , where c £ , with asymptotic bias and variances 

of the same form as in the local bilinear (r = i) versions of (5.7) and (5.6), with p = 2, 
Wo replaced by wq = 0 + , and f RP -i by f_ ; see, for example, page 666 of [17]. 

Remark 5.2. In practice, we may be concerned with the estimation of the quantile 
regression functions at different r’s simultaneously. Restricting to the estimation of 
(0^ i;Wo ,0^ 2 . Wo )', it can be shown by proceeding as in the proof of Theorem 5.2 that 

(^r i; wo>4 2 ;wo)' is asymptotically normal with a block-diagonal asymptotic covariance 
matrix, that is, 6 TlV , 0 and 6 t- 2;wo are asymptotically independent for t\ ^ T 2 . 


6. Bandwidth selection 

While the choice of a kernel, as usual, has little impact on the final result, selecting the 
bandwidth h is more delicate. A full plug-in estimator in principle could be derived from 
the asymptotic normality result of Theorem 5.2, along the same lines as, for instance, in 
Zhang and Lee [46], who do it for mean regression. Such an approach, however, requires 
the estimation of several conditional densities, hence raises further problems, besides be¬ 
ing computationally quite heavy, certainly when several values of r are to be considered. 
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A simpler heuristic rule is thus preferable; the one we are describing here is adapted from 
[45], where it is proposed in the context of single-output quantile regression. 

Without loss of generality, we restrict to p — 1 = 1 for notational simplicity, writing 
W and w for W and w, h for h n and Oh = (a£ ;u , 0 ,cj ’:'. Wo )' for the estimator of 6 = 
{. a T\woi c ' T - Wo )' associated with bandwidth h, respectively. Throughout, the kernel K is 
some symmetric density function, such as the standard normal one. The objective is 
to minimize, with respect to h, the asymptotic mean square error which, in view of 
Theorem 5.2 with p — 1 = 1, after some straightforward algebra takes the form 


MSE(h) = E(6 h - 0)'(9 h - 0) « ]h*B 2 + \v T , 

4 nh 


( 6 . 1 ) 


with 


Bl := (/xf f + £ c 2 t . WoJ j and V T := ti-(G^ Wo G Wo G^ Wo ) 


where Ct- W o ,j is the second-order derivative with respect to wo of the jth component of 
c T;wo , G t - Wo is defined in Assumption (Al)(v), and 

The minimizer h T of (6.1) satishes 


hi = 


Vt 

nB £ 




n(p«rr(w 0 )(a* ;v 


+ E7= 


m~ 1 "2 

3 =1 


( 6 . 2 ) 


so that for any r 1 , 7 * 2 , 


/- \ /"2 1 \—yTn — 1 ..9 

n(l - n) (a-r 2 , Wo + Ej=1 c “ 


T2,Wq 




r r- 1 

Wo yjX ' w 0 ^Ti ,W 0 


t 2 (1-t 2 ) (di+ZTJi 1 ? 


T 1 ,W 0 


_j ) t r (Gt 2 >Wo G Wo G T2 tWo ) 


(6.3) 


As in [45] , we assume that a TUyWo and c TU ,w 0 do not depend on r (an assumption we do 
not make on a TUyWo and c TUjWo ). If f Yu ^ Yu =t ’ w = w o were a normal density with mean /it.w 0 
and variance a\ Wg , denoting by <f> and $ the standard normal density and distribution 
functions, respectively, we would have / iu l Y >* =t ’ w=Wo {a T - Wo + c(_ :u , o t) = ct^ o <()(4>“ 1 (t)), 
hence 

G t . W0 =^-\1 /2)) ^ dt 


1 ('r 2 )) 2 

tr(G ts^iuoGw 0 Grj.um) “ l_0(^ >_1 ( T l)). 


and 
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If we further assume that a\ Wq = ct 2 o , (6.2) for r = 1/2 takes the form 


a/2 


/ C* __ 

2 U (/*?) 2 f w (w 0 )(dl /2 . wo + E ™'! 1 c 2 /2;uW ) 


(6.4) 


while (6.3) yields 

/ h Tl \ 5 _ ri(l -ti) (^(41 1 (r~ 2 ))) 2 

VW ^(l-^^-Hn))) 2 

hence, for t 2 = u/2, ft® = ( 2 / 7 t)r(l - t)(</( 4>~ 1 (r))) _ 2 /i^ /2 . 

This latter expression still is not readily implementable. However, (6.4) bears a strong 
relation to the optimal bandwidth value h F z obtained by Fan and Zhang in Theorem 1 of 
[13] for the estimation of the conditional mean in the varying-coefficient linear regression 
model Yu = a(W) + c(bF)'Y,/ + e u with Var(e u | W = wq) = cr^ o , namely 


Cg _tr(G w l)v 2 W0 


(2/n)h 5 u/2 . 


We therefore propose, for t = ru, the bandwidth h T provided by 

=r (l — T )(0(^ _1 ( r ))) _2 ^F Z > (6-6) 

where, for the selection of /ifz, we may rely, for instance, on the plug-in rule developed 
by [46]. 

This rule (6.6) can be regarded as the combination of a plug-in strategy and a rule-of- 
thumb: plug-in strategy in the selection of hpz but rule-of-thumb for the dependence on 
t. It furthermore implies that the selected h T has the same n^ 1 / 7 rate of convergence as 
h F z (see [46]). 


7. A real data example 

In order to illustrate the data-analytic power of the proposed method, we consider the 
“body girth measurement” dataset from [20], that was already investigated in HPS. The 
dataset consists of joint measurements of nine skeletal and twelve body girth dimensions, 
along with weight, height, and age, in a group of 247 young men and 260 young women. 
As in HPS, we discard the male observations, we restrict to the calf maximum girth (Yi) 
and the thigh maximum girth (Y 2 ) for the response, and use a single random regressor W 
(weight, height, age, or BMI). Figures 5 and 6 provide cuts - for the same w- and r-values 
as in HPS - obtained from the proposed local constant and local bilinear approaches, 
respectively. 

These cuts confirm most of the global analysis conducted in HPS and moreover reveal 
some interesting new features. For instance, 
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Figure 5. Four empirical (local constant) regression quantile plots from the body girth mea¬ 
surements dataset (women subsample; see [20]). Throughout, the bivariate response (Yi. Y^)' 
involves calf maximum girth (Yi) and thigh maximum girth (Y 2 ), while a single random re¬ 
gressor is used: weight, age, BMI, or height. The plots are providing, for t= 0.01, 0.03, 0.10, 
0.25, and 0.40, the cuts of the local constant regression r-quantile contours, at the empirical 
p-quantiles of the regressors, for p= 0.10 (black), 0.30 (blue), 0.50 (green), 0.70 (cyan) and 0.90 
(yellow). The n = 260 data points are shown in red (the lighter the red color, the higher the 
regressor value). The results are based on a Gaussian kernel and the bandwidth H = 3cru,n _1,/5 , 
where a w stands for the empirical standard deviation of the regressor (the corresponding cuts 
obtained from linear regression are provided in Figure 7 of HPS). A color version of this figure 
is more readable, and can be found in the on-line edition of the paper. 


(a) for the dependence on weight, the local bilinear approach confirms the positive 
trend in location, the increase in dispersion, and the evolution of “principal direc¬ 
tions” (as weight increases, the first “principal direction” rotates from horizontal 
to vertical), and it further indicates that high weights give rise to simultaneously 
large extreme values in Yi and Y^. The differences, for low and high values of the 
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(weight) (age) 

y 2 y 2 






Figure 6. Same quantities as in Figure 5, here obtained from the local bilinear approach, with 
the same kernel and bandwidth as in Figure 5 (the computation was based on 360 equispaced 
directions u £ (S 1 ). A color version of this figure is more readable, and can be found in the on-line 
edition of the paper. 

covariate (weight), between the contours resulting from the local bilinear and local 
constant approaches illustrate the sensitivity of the latter to boundary effect; 

(b) for the dependence on age, the local regression quantile regions, parallel to their 
global HPS counterparts, do indicate that the location and the first principal 
direction (along the main bisector) are constant over age. Still as in HPS, the 
local approaches confirm that the shapes of outer contours vary quite significantly 
with age, indicating an increasing (with age) simultaneous variability of both calf 
and thigh girth largest values. Now, compared to HPS, the local bilinear approach 
further shows that young women present a large simultaneous variability of both 
calf and thigh girth smallest values; 

(c) for the dependence on height, the local methods confirm the regression effect spe¬ 
cific to inner contours. The local bilinear approach further shows that there is also 
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a regression effect for outer contours that, as height increases, get more widespread 
in the direction u (corresponding to simultaneously large values of both responses). 

Limited as it is, this short application demonstrates how the local quantile regression 
analysis proposed here complements and refines the findings obtained from the global 
approach introduced in HPS by revealing the possible non-linear, heteroskedastic, skew¬ 
ness ... features of the distributions of Y conditional on W = w. We refer to [32] for a 
further application, in the context of bivariate growth charts. 

We conclude this section with a brief discussion of the computational aspects of the 
proposed methods. In principle, any quantile regression/linear programming/convex op¬ 
timization solver can be used for that purpose. The exact local constant quantile/depth 
contours can be computed for any uiq via a weighted version of the HPS algorithm 
see Paindaveine and Siman [34] for a detailed description of its Matlab implementation 
and its computation cost. The local bilinear contours, for given wq, are determined by 
considering a fixed number M of directions; their computation then is as demanding 
as M times the standard simple-output quantile regression with the same number of 
regressors; see Koenker [26] for computational and algorithmic details. 


8. Conclusion 

In this paper, we propose a definition of regression depth as the conditional depth of an 
to- dimensional response conditional on a p-dimensional covariate. We also propose local 
constant and local bilinear methods for the estimation of conditional depth contours, and 
establish the consistency and asymptotic normality of the estimators. As a descriptive 
tool, the resulting contours provide a powerful data-analytic tool, while our asymptotic 
results guarantee that, for n large enough, those contours are able to detect any covariate- 
dependent feature of the conditional distributions of the response. An important domain 
of application for such methods is in the analysis of multiple output growth charts, 
where current practice is essentially restricted to a marginal approach that neglects all 
information related to joint conditional features. 


Appendix: Proofs of asymptotic results 


We actually restrict to the local bilinear case (proofs for the local constant case are 
entirely similar). The proofs rely on several lemmas, and require some further notation. 
Referring to (5.1) and (5.2), define 


6 £ = vec 


( ®r;wo 
^t;wq 


' ;w 0 
• / 


=: vec 




and 



Denote by zu i = (ai,^)' and zu i = (di,ci)' two arbitrary vectors of K m , by ZU 2 = 
(a 2 ,C 2 )' and = (e^c^)' two arbitrary m x (p — 1) matrices. Let H n := \jnhn _1 and 
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(n) := HnM.fr vec 
p := H„ M£ vec 
ip := vec 


(®Wo ro w 0 ) 




,)V ’ 


(ixt! -ct wo )' 

(ro 2 -ro W0 )' 

(ct! -u7 W0 y 

(ro 2 ~ro W0 )' 


(A.l) 


and note that p^ = Vnhn 1 M£(0 — 9 e ). Define Whi := (W* — w o)/h n , Khi := 

K( W hi ) and ATl iu := (M£) _1 = (1, Y&)' ® (1, W(J'. 

Let = Z- U (9 e ) := Y iu — 9'' X ( : m as in Theorem 5.1, and define 

d'ai '■ = h n k T . vf0 'Whi + ^n(veCCT-; Wo ) (Y,^ u 0 hi )j 

4(V) “ u and [/„ = C/ ni ( V ) := T ni + H^p'X £ m 

(note that the latter two quantities depend on the choice of ®i and •c^)- The following 
identities will be useful in the sequel: 


^1U — Yu (Qt;w 0 + c T ;w 0 Aiu) 


r_L ' 
m. 

^ni(V > ) = ~ (dr;w 0 + c r;wi ) '^ra) — Uni{p) 

= Yiu - (vec(ro 1 ,-d7 2 ) , ) , ^ u . 


(A.2) 

(A.3) 


Let C be a generic constant whose value may vary from line to line. Since K is a bounded 
density with a bounded support, we have, whenever Khi > 0, 


-'ill). 


l|Wfcj|| < C and || Af£j U || < C(1 + ||Y' 
and, when moreover ||y>|| < M, 

\T n i\ < Ch n (l + ||YA (I) and \U ni \ < C(h n + + ||Y, J 

^£(n) 

It follows from the definition of 9 as the argmin of (5.3) that 

n 

p {n) = argmin^ K hi p T {ZY{p)). 
vgRmp , =1 

Recalling that ip T (y) := r — I[y < 0], define 

n 

V„(v>) := H- 1 K hi MZ* ni (p))Xi iu . 


(A.4) 

(A.5) 

(A.6) 

(A.7) 


In order to prove Theorem 5.1, we need the following lemma. 
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Lemma A.l. Let V„(-):K mp — > be a sequence of functions that satisfies the fol¬ 

lowing two properties: 

(i) for all A > 1 and all ip € R mp , -ip'V n (\ip) > -ip'V n (ip) a.s.; 

(ii) there exist a p x p positive definite matrix D and a sequence of mp-dimensional 
random vectors A„ satisfying ||A„|| = Op(l) such that, for all M > 0, 
® u P||i/;||<m l|V„('*/’) + (G-^wo ® D )ip - A„ || =o P (l), where G T:Wo is given in As¬ 
sumption (Al)(v). 

Then, ifip n is such that ||V„('0„)|| =op(l), it holds that \\ip n \\ = Op(l) and 

ip n = (G t;Wo ®D) _1 A„ +o P (l). (A.8) 

Proof. The proof follows along the same lines as in page 809 of [27]; details are left to 
the reader. □ 


The proof of Theorem 5.1 consists in checking that the assumptions of Lemma A.l 
hold for V„ defined in (A. 7); we use the following lemma. 


Lemma A. 2 . Under Assumptions (A1)-(A3 ) , for any (tp, ip) such that max(||y>||, ||v?||) < 
M, and n large enough, 


E [K hi \ip r {z* ni {v)) - MKiivm < CV[K hi I[\Z* ni m < CH-'Wv - £||]] 

<Ch^H- x \\ V -ip\\ 


(A.9) 


and 


E [Kl\MZ*M) - MKifr)) I 2 ] < CE[Kll[\Z* ni m < CH- 1 1| V - ¥>||]] 

<Ch*r 1 H; 1 \\'p-<p\\. 


(A.10) 


Proof. The claim, in this lemma, is similar to that of Lemma A.3 in [17], which es¬ 
sentially follows from the same argument as in the time series case (cf. [31]). Details, 
however, are quite different. It follows from (A.4) that 


= K hi \i[z* ni (v)<o\-i[z* ni (ip)<o}\ 

= K hi \i[z* ni ((p) < H-\v - q>)’x Lu] - i[z* n m < 0]| 

< K hi I[\Z* i (tp)\ < CH-^Wip — ¥>||(1 + HY^jII)]. 

Hence, from (A.3) and the mean value theorem, we obtain 

< E[K hi I[\Z* ni (ip)\ < CH-^ip - $>11(1 + ||Y,i||)]] 

= E[K hi P[\Z* ni (tp)\ < CH-'Wv ~ £11(1 + l|Y,4||)|Y^, WJ] 
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= E[K hi F Y ^’ w \a T . W0 +c;. W0 Y£ + U ni {(p) + CH~ l \\<p - £||(1 + ||Y,4||))] 

- E[K hi F Y ^ ’ w )(a. ;wo + c;. W0 Y,4 + U ni {ip) ~ CF[- l \\ip - £||(1 + ||Y^||))] 
<E[K hi (l+ ||Yi||)/ Fu K Y u- w )(a x;W0 +e T , V0 Y^ +Unite) 

+ \CH~ l \te — ¥>||(1 + ||Y^||))] 

x2CH~ 1 \\<p-<p\\ 

for some A € (—1,1). Assumptions (Al)-(A3), together with (A. 5), therefore yield that, 
for ip, Cp G {ip: ||y>|| < M} and n large enough, 


E[K hi \i, T {z* n ite))-Mz* n item 

<CH- l \\ip-q,\\E\K h i [ (H-||t||)/ y “K Y ^= t ’ w )(a x;W 0 +c; ;W 0 t)/ Y -l w (t)dt 

J r- 1 

=c^ n - i H- i \\ip-ip\ir^ 0 ) 

x [ (l+||t||)/ Y “'^= t ' w = w °)(a T;wo +c; ;wo t)/ Y ul w = w o ( t)dt, 

J R m -! 


which establishes (A.9); (A. 10) follows along similar lines. 


□ 


Lemma A.3. Under Assumptions (A1)-(A3), we have that, as n—>oo, 


sup ||V n (<p) — V„(0) — E[V„(<p) — V„(0)]|| = o p (l). (A.11) 

Ml <m 

Proof. The proof of this lemma is quite similar, in view of Lemma A.2, to that of Lemma 
A.4 in [17]. Details are therefore omitted. □ 

Lemma A.4. Under Assumptions (Al)-(A3), we have that, as n —> oo, 


sup ||E[V„(y>) — V„(0)] + (G r;Wo ® D)<p|| = o(l), (A.12) 

llvll <M 

where D = / w (w 0 ) diag(l, H%). 

Proof. Note that V„(<p) - V„(0) = H~ l £? =1 K hl [^ T (Z* ni (ip)) - Vv (Zf u )\X e hiu . It fol¬ 
lows from (A.2) and (A.3) that 

E[V n (^) - V n (0)] - nH-^KMZl < 0] - I[Z* ni (ip) < 0])X e hiu } 

= H n h~^E{K hl {F Y ^™\a^ 0 + c;. W0 Y,4 + T ni ) 

_ pYM .w )(ax;wo + c ; ;wo Yi + Uni))X l hiu ]. 
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Then, similar to the proof of Lemma A.2, by the mean value theorem, since U n i — T n i = 
there exists £ £ (0,1) such that 

sup ||E[V n (v>) - V„(0)] + (G t:W0 ® D)vj|| 

IMI <m 

= sup II (Gt- ;Wo ® T>)ip 

M<M 

h^-i)v[ Khi f Y *\rtW (« T;W0 + c' T;wo + T ni + ^H- 1 X^tp) X e hiu AT* u¥ >] || 

= sup ||{(G x;wo ®D) -h-^- l ^[K hi f Y ^^\a T . tV , 0 +c' r . V0 Yt u )X{ ia X% iu \}<p 
IMI <m 

- /i-^- 1 )E[^(/^l^’ w )(a x;W0 + c;. W0 Y,i + T ni + 

_/^ICvi,w) (aT;w 0+ c; ;wo Y t 4))Yi m Yi' uV ]|| 

< G||(G t;W0 ® D) - h-^E[K h J Y ^^ ] (a T , W0 + c^Y^X^X^W 

+ C sup h-^-MK hi \f Y ^’ W \a T . W0 + c;. W0 Yi + T n i + ZH?X &. u¥? ) 

Ml <M 

- / F " l(Y - W) (a T; w 0 +c; ;W0 Y i L u )|||YL u Yl' u ||] =0(1), 

where we used Assumptions (Al) and (A2), together with (A. 5). □ 

Lemma A.5. Let Assumptions (A2) and (A3) hold. Then the random vector de¬ 
fined in (A..1) satisfies ||V n (<^"))|| =op(l). 

Proof. The proof follows from a argument similar to that of Lemma A.2 on page 836 of 
[36], □ 


Lemma A.6. Under Assumptions (A1)-(A3), for any d £ K. mp , 
lim E[{d'(V„(0) — E[V„(0)])} 2 ] 

n—too 

= t(1 -t)/ w (w 0 ) f f ([(1,t') ® (l,w')]d) 2 / Y “ |W=Wo (t)A' 2 (w)dtdw. 
Jrp - 1 J R m -! 

Proof. Set Vi = K h ^ T {Z* iu )d'X* hivl = K hi ip T (Zf u )[(l,Y^) ® (1, W(J]d. A simple cal¬ 
culation yields 

E[{d'(V„(0) - E[V n (0)])} 2 ] = H~ 2 nVai[vi] = h~^ Var^]. 

Note that, for k = 1,2, 

lim h-^m^i[z( u < m'<i») k ] 


(A.13) 
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= lim / i -( p - 1 )E[At 1 F Y ^( Y - w )(a T;Wo +4 :Wo Y 1 J u + T nl )(d'A’l lu ) fc ] 

n—>-oo 

= r/ w (w 0 ) f f A fc (w)([(l,t , )(8>(l,w')]d) fc / Y - |w=Wo (t)dtdw, 

JRP - 1 JR ” 1 - 1 

which leads to 
lim 

n—>oo 

= I™ h-^E[K h i(T-I[Z e lu <0])(d'Xi lu )} 

n—> oo 

= ( r - T )/ W ( w o) [ [ A(w)([(l,t')® (l,w')]d)/ Y ” |w=Wo (t)dtdw = 0 

JRP - 1 J R m 1 


lim h-^-V E[5?] 

n—>oo 

= lim / l -^- 1 )E[At 1 (r 2 -2r/[Zf u <0] + /[^ u <0])(d'A’l lu ) 2 ] 

n—>• oo 

= T (1 -'r)/ W (w 0 ) [ [ A 2 (w)([(l,t') ® (l,w / )]d) 2 / Y ” |w=Wo (t)dtdw. 

JRP - 1 JR ”*- 1 

Therefore, 

lim Var[hi] 

n—>-oo 

= lim(^( p - 1 )E[0 2 ]-/ l -( p - 1 )(E[i) 1 ]) 2 ) 

n—>• oo 

= r(l- t)/ w (w 0 ) f f ^ 2 (w)([(l,t')®(l,w , )]d) 2 / Y ” |w=W0 (t)dtdw, 

JRP - 1 JR ”*- 1 

which, together with (A.13), establishes the result. □ 

Proof of Theorem 5.1. The proof consists in checking that the conditions of 
Lemma A.l are satisfied. Lemmas A.3 and A.4 entail that Lemma A.l(ii) holds, 
with D = / w (w 0 )diag(l,/rf) (yielding (G T;Wo <g> D) -1 = r4 ;w J and A„ = V„(0) = 
H ~ 1 l Kh.iipriZi u)^Luj which, by Lemma A.6, is Op(l). As for Lemma A.l(ii), the 
fact that 


A H- -¥>'V„(A<^) = H - 1 £ K hz MZL ~ AIL" V'^Lu)(-^^Lu) 

i—1 


is non-decreasing directly follows from the fact y ip T (y) is non-decreasing. Since 
(Lemma A.5 and Assumptions (A2) and (A3)) ||V„(y>l n ))|| is op(l), Lemma A.l ap¬ 
plies, which concludes the proof. □ 
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Proof of Theorem 5.2. On the basis of the Bahadur representation of Theorem 5.1, 

~ l(n) 

the asymptotic normality of Q follows exactly as in the corresponding proofs for 
usual nonparametric regression in the i.i.d. case (see, e.g., [10]), yielding the asymptotic 
normality with the bias (i.e., the expectation) of the first term on the right-hand side of 
(5.4) as 


E 


Vnhn 1 


°=Y,KhMZl)X 


n i= 1 


\/ nhn 


- r nE[K hl MZi u )Xiiu\ 


= vi:w 0 \jnh p n X hJ p a) E[^i(F Yu l (Y “’ w )(a T ; W + c;. w Y( u ) 

- F Yu 1 (Y “ ’ w) (a T;wo + c; ;W0 Y y + T nl ))X £ lu ] 


= \/nh p „~ 1 (^ B 


o (hi) 


where the last equality is derived from a first-order Taylor expansion of y i-> iOldOG ,x ) (y) 
and a second-order Taylor expansion ofw4 (ot- ; w , c(. ; w )' at w = Wo (these expansions 
exist in view of Assumptions (Al) and (A4)). The o(/i£) term is taken care of by As¬ 
sumption (A3). The asymptotic variance of the theorem readily follows from Lemma A.6. 
Details are omitted. □ 
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